Prediction
Predictionsβ
Predictor Classesβ
Decoding Strategiesβ
eole.predict.greedy_search.sample_with_temperature(logits, temperature, top_k, top_p)β
Select next tokens randomly from the top k possible next tokens.
Samples from a categorical distribution over the top_k
words using
the category probabilities logits / temperature
.
- Parameters:
- logits (FloatTensor) β Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.) - temperature (float) β Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
- top_k (int) β This many words could potentially be chosen. The other logits are set to have probability 0.
- top_p (float) β Keep most likely words until the cumulated probability is greater than p. If used with top_k: both conditions will be applied
- logits (FloatTensor) β Shaped
- Returns:
- topk_ids: Shaped
(batch_size, 1)
. These are the sampled word indices in the output vocab. - topk_scores: Shaped
(batch_size, 1)
. These are essentially(logits / temperature)[topk_ids]
.
- topk_ids: Shaped
- Return type: (LongTensor, FloatTensor)
Scoringβ
class eole.predict.penalties.PenaltyBuilder(cov_pen, length_pen)β
Bases: object
Returns the Length and Coverage Penalty function for Beam Search.
- Parameters:
- length_pen (str) β option name of length pen
- cov_pen (str) β option name of cov pen
- Variables:
- has_cov_pen (bool) β Whether coverage penalty is None (applying it is a no-op). Note that the converse isnβt true. Setting beta to 0 should force coverage length to be a no-op.
- has_len_pen (bool) β Whether length penalty is None (applying it is a no-op). Note that the converse isnβt true. Setting alpha to 1 should force length penalty to be a no-op.
- coverage_penalty (callable [ *[*FloatTensor , float ] , FloatTensor ]) β Calculates the coverage penalty.
- length_penalty (callable [ *[*int , float ] , float ]) β Calculates the length penalty.
coverage_none(cov, beta=0.0)β
Returns zero as penalty
coverage_summary(cov, beta=0.0)β
Our summary penalty.
coverage_wu(cov, beta=0.0)β
GNMT coverage re-ranking score.
See βGoogleβs Neural Machine Translation Systemβ [].
cov
is expected to be sized (*, seq_len)
, where *
is
probably batch_size x beam_size
but could be several
dimensions like (batch_size, beam_size)
. If cov
is attention,
then the seq_len
axis probably sums to (almost) 1.
length_average(cur_len, alpha=1.0)β
Returns the current sequence length.
length_none(cur_len, alpha=0.0)β
Returns unmodified scores.
length_wu(cur_len, alpha=0.0)β
GNMT length re-ranking score.
See βGoogleβs Neural Machine Translation Systemβ [].